Gaussian Mixture Model Based Methods for Virtual Microphone Signal Synthesis

نویسندگان

  • Athanasios Mouchtaris
  • Shrikanth S. Narayanan
  • Chris Kyriakakis
چکیده

Multichannel audio can immerse a group of listeners in a seamless aural environment. However, several issues must be addressed, such as the excessive transmission requirements of multichannel audio, as well as the fact that to-date only a handful of music recordings have been made with multiple channels. Previously, we proposed a system capable of synthesizing the multiple channels of a virtual multichannel recording from a smaller set of reference recordings. In this paper these methods are extended to provide a more general coverage of the problem. The emphasis here is on time-varying filtering techniques that can be used to enhance particular instruments in the recording, which is desired in order to simulate virtual microphones in several locations close and around the sound source. INTRODUCTION Multichannel audio can enhance the sense of immersion for a group of listeners by reproducing the sounds that would originate from several directions around the listeners, thus simulating the way we perceive sound in a real acoustical space. However, several key issues must MOUCHTARIS ET AL. VIRTUAL MICROPHONE SYNTHESIS Fig. 1: An example of how microphones may be arranged in a recording venue for a multichannel recording. In the virtual microphone synthesis algorithm, microphones A and B are the main reference pair from which the remaining microphone signals can be derived. Virtual microphones C and D capture the hall reverberation, while virtual microphones E and F capture the reflections from the orchestra stage. Virtual microphone G can be used to capture individual instruments such as the tympani. These signals can then be mixed and played back through a multichannel audio system that recreates the spatial realism of a large hall. be addressed. Multichannel audio imposes excessive requirements to the transmission medium. A system we previously proposed [7, 8], attempted to address this issue by offering the alternative to resynthesize the multiple channels of a multichannel recording from a smaller set of signals (e.g. the left and right ORTF microphone signals in a traditional stereophonic recording). The solution provided, termed multichannel audio resynthesis, was concentrated on the problem of enhancing a concert hall recording and divided the problem in two different parts, depending on the characteristics of the recording to be synthesized. Given the microphone recordings from several locations of the venue (stem recordings), our objective was to design a system that can resynthesize these recordings from the reference recordings. These resynthesized stem recordings are then mixed in order to produce the final multichannel audio recording. The distinction of the recordings was made depending on the location of the microphone in the venue, thus resulting into two different categories, namely reverberant and spot microphone recordings. For simulating recordings of microphones placed far from the orchestra (reverberant microphones), infinite impulse response (IIR) filters were designed from existing multichannel recordings made in a particular concert hall. The IIR filters designed were shown to be capable of recreating the acoustical properties of the venue at specific locations. In order to simulate virtual microphones in several locations close and around the orchestra (spot microphones), it is important to design time-varying filters that can track and enhance particular musical instruments and diminish others. In this paper, we address the more general problem of multichannel audio synthesis. The goal is to convert existing stereophonic or monophonic recordings into multichannel, given that to-date only a handful of music recordings have been made with multiple channels. The same approach is followed as in the resynthesis problem. Based on existing multichannel recordings, we decide which microphone locations must be synthesized. For reverberant microphones, the filters designed in the resynthesis problem can be readily applied to arbitrary recordings. Their time-invariant nature offers the advantage that these filters can be applied to various recordings while having been designed based on a given recording. In contrast, the time-varying nature of the methods designed for spot microphone resynthesis, prohibits us from applying them in an arbitrary recording. This is the problem that we focus on in this paper. The next section outlines the spectral conversion method that is employed for the resynthesis problem and is followed by the section on the adaptation method that allows for using these conversion parameters to an arbitrary recording (synthesis problem). Finally, the algorithms described are validated by simulation results and possible directions for future research are given. SPECTRAL CONVERSION The approach followed for spot microphone resynthesis is based on spectral conversion methods that have been successfully employed to speech synthesis applications [1, 12, 5]. A training data set is created from the existing reference and target recordings by applying a short sliding window and extracting the parameters that model the short-term spectral envelope (in this paper we use the cepstral coefficients [9]). This set is created based on the parts of the target recording that must be enhanced in the reference recording. If, for example, the emphasis is on enhancing the chorus of the orchestra, then the training set is created by choosing parts of the recording where the chorus is present. This procedure results in two vector sequences, [x1x2 . . .xn] of reference AES 113 CONVENTION, LOS ANGELES, CA, USA, 2002 OCTOBER 5–8 2 MOUCHTARIS ET AL. VIRTUAL MICROPHONE SYNTHESIS spectral vectors, and [y1y2 . . .yn] as the corresponding sequence of target spectral vectors. A function F(·) can be designed which, when applied to vector xk, produces a vector close in some sense to vector yk. Many algorithms have been described for designing this function (see [1, 12, 5, 2] and the references therein). In [8] the algorithms based on Gaussian mixture models (GMM, [12, 5]) were found to be very suitable for the resynthesis problem. According to GMM-based algorithms, a sequence of spectral vectors xk as above, can be considered as a realization of a random vector x with probability density function (pdf) that can be modeled as GMM

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Enhancement using Laplacian Mixture Model under Signal Presence Uncertainty

In this paper an estimator for speech enhancement based on Laplacian Mixture Model has been proposed. The proposed method, estimates the complex DFT coefficients of clean speech from noisy speech using the MMSE  estimator, when the clean speech DFT coefficients are supposed mixture of Laplacians and the DFT coefficients of  noise are assumed zero-mean Gaussian distribution. Furthermore, the MMS...

متن کامل

Speech Enhancement Using a Multidimensional Mixture-Maximum Model

We present a single-microphone speech enhancement algorithm that models the log-spectrum of the noise-free speech signal by a multidimensional Gaussian mixture. The proposed estimator is based on an earlier study which uses the single-dimensional mixture-maximum (MIXMAX) model for the speech signal. The experimental study shows that there is only a marginal difference between the proposed exten...

متن کامل

A MEMS Capacitive Microphone Modelling for Integrated Circuits

In this paper, a model for MEMS capacitive microphone is presented for integrated circuits.  The microphone has a diaphragm thickness of 1 μm, 0.5 × 0.5 mm2 dimension, and an air gap of 1.0 μm. Using the analytical and simulation results, the important features of MEMS capacitive microphone such as pull-in voltage and sensitivity are obtained 3.8v and 6.916 mV/Pa, respectively while there is no...

متن کامل

Signal-to-Signal Ratio Independent Speaker Identification for Co-Channel Speech Signals

In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition accuracies have already been reported when an accurately estimated signal-to-signal ratio (SSR) is available. In this paper, we approach the problem without e...

متن کامل

Comparative Analysis of Image Denoising Methods Based on Wavelet Transform and Threshold Functions

There are many unavoidable noise interferences in image acquisition and transmission. To make it better for subsequent processing, the noise in the image should be removed in advance. There are many kinds of image noises, mainly including salt and pepper noise and Gaussian noise. This paper focuses on the research of the Gaussian noise removal. It introduces many wavelet threshold denoising alg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002